Analytics over Probabilistic Unmerged Duplicates
نویسندگان
چکیده
This paper introduces probabilistic databases with unmerged duplicates (DBud), i.e., databases containing probabilistic information about instances found to describe the same real-world objects. We discuss the need for efficiently querying such databases and for supporting practical query scenarios that require analytical or summarized information. We also sketch possible methodologies and techniques that would allow performing efficient processing of queries over such probabilistic databases, and especially without the need to materialize the (potentially, huge) collection of all possible deduplication worlds.
منابع مشابه
Multi-granulation fuzzy probabilistic rough sets and their corresponding three-way decisions over two universes
This article introduces a general framework of multi-granulation fuzzy probabilistic roughsets (MG-FPRSs) models in multi-granulation fuzzy probabilistic approximation space over twouniverses. Four types of MG-FPRSs are established, by the four different conditional probabilitiesof fuzzy event. For different constraints on parameters, we obtain four kinds of each type MG-FPRSs...
متن کاملA Statistical Data Fusion Technique in Virtual Data Integration Environment
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated records from the different integrated data sources. It refers to the process of selecting or fusing attribute values from the clustered duplicates into a single record representing the real world object. In this paper, a statistical technique for data fusion is introduced based on some proba...
متن کاملPolicy Analytics Generation Using Action Probabilistic Logic Programs
Action probabilistic logic programs (ap-programs for short) [15] are a class of the extensively studied family of probabilistic logic programs [14,21,22]. ap-programs have been used extensively to model and reason about the behavior of groups and an application for reasoning about terror groups based on ap-programs has users from over 12 US government entities [10]. ap-programs use a two sorted...
متن کامل$Υ$-DB: A system for data-driven hypothesis management and analytics
The vision of Υ-DB introduces deterministic scientific hypotheses as a kind of uncertain and probabilistic data, and opens some key technical challenges for enabling data-driven hypothesis management and analytics. The Υ-DB system addresses those challenges throughout a design-by-synthesis pipeline that defines its architecture. It processes hypotheses from their XML-based extraction to encodin...
متن کامل